The dataset used in this test case is the Oxford Parkinson's Disease Telemonitoring Dataset.
Reference: A Tsanas, MA Little, PE McSharry, LO Ramig (2009); 'Accurate telemonitoring of Parkinson’s disease progression by non-invasive speech tests', IEEE Transactions on Biomedical Engineering (to appear).
Import Data and Visualize Table Structure The dataset is comprised of 16 biomedical voice measurements taken from 42 patients; approximately 200 measurements were taken per patient. Vocal impairment after the onset the disease is prevalent in 70 - 90% of the pacients based on some studies.
The aim of the data is to predict the clinician's Parkinson's disease symptom score on the UPDRS (Unified Parkinson's Disease Rating)scale, which reflects .
In [1]:
import numpy as np
import pandas as pd
import os
from pandas import DataFrame
from pandas import read_csv
from numpy import mean
from numpy import std
import matplotlib.pyplot as plt
import matplotlib
%matplotlib inline
matplotlib.style.use('ggplot')
import seaborn as sns
results = read_csv('parkinsons_updrs.csv')
results.head()
Out[1]:
In [2]:
data = [results['motor_UPDRS'].describe(),results['total_UPDRS'].describe()]
df = pd.DataFrame(data)
df.round(2)
Out[2]:
Basic Statistics results suggest: larger variability in the total_UPDRS index.
The objective of the Python code below is just to have other statistical parameters on a single table format.
In [3]:
other_Stats= {'Median': [results['motor_UPDRS'].median(),results['total_UPDRS'].median()], 'Skew':[results['motor_UPDRS'].skew(),
...:results['total_UPDRS'].skew()],'Kurtosis':[results['motor_UPDRS'].kurt(), results['total_UPDRS'].kurt()]}
df1 = pd.DataFrame(other_Stats, index=['motor_UPDRS', 'total_UPDRS'])
df1.round(2)
Out[3]:
In [4]:
plt.subplot(1, 2, 1)
plt.hist(results["motor_UPDRS"],color = "skyblue")
plt.xlabel('Motor_UPDRS Index')
plt.ylabel('Frequency')
plt.subplot(1, 2, 2)
plt.hist(results["total_UPDRS"],color = "green")
plt.xlabel('Total_UPDRS Index')
plt.show()
In [5]:
data1 = [results['motor_UPDRS'],results['total_UPDRS']]
fig, ax = plt.subplots(figsize=(5, 5))
plt.boxplot(data1)
ax.set_xlabel('motor_UPDRS, total_UPDRS')
ax.set_ylabel('Response')
plt.show()
In [6]:
ax=sns.factorplot(x="age", y="motor_UPDRS", col="sex", data = results, kind="box", size=3, aspect=2)
ax=sns.factorplot(x="age", y="total_UPDRS", col="sex", data = results, kind="box", size=3, aspect=2)
In [7]:
#groupby_subject= results.groupby('subject#')
sns.factorplot(x= 'subject#', y= 'motor_UPDRS', hue='age', col='sex', data=results, kind="swarm", size=3, aspect=3);
sns.factorplot(x= 'subject#', y= 'total_UPDRS', hue='age', col='sex', data=results, kind="swarm", size=3, aspect=3);
In [9]:
tab_1 = pd.crosstab(index=results["subject#"], columns="count")
print(tab_1)
tab_2 = pd.crosstab(index=results["age"], columns="count")
plt.hist(results['age'], color="violet")
plt.ylabel('Qty of observations');
plt.xlabel('Age')
plt.show()
print(tab_2)